5 research outputs found
Enabling Work-conserving Bandwidth Guarantees for Multi-tenant Datacenters via Dynamic Tenant-Queue Binding
Today's cloud networks are shared among many tenants. Bandwidth guarantees
and work conservation are two key properties to ensure predictable performance
for tenant applications and high network utilization for providers. Despite
significant efforts, very little prior work can really achieve both properties
simultaneously even some of them claimed so.
In this paper, we present QShare, an in-network based solution to achieve
bandwidth guarantees and work conservation simultaneously. QShare leverages
weighted fair queuing on commodity switches to slice network bandwidth for
tenants, and solves the challenge of queue scarcity through balanced tenant
placement and dynamic tenant-queue binding. QShare is readily implementable
with existing switching chips. We have implemented a QShare prototype and
evaluated it via both testbed experiments and simulations. Our results show
that QShare ensures bandwidth guarantees while driving network utilization to
over 91% even under unpredictable traffic demands.Comment: The initial work is published in IEEE INFOCOM 201
Deadlocks in Datacenter Networks: Why Do They Form, and How to Avoid Them
ABSTRACT Driven by the need for ultra-low latency, high throughput and low CPU overhead, Remote Direct Memory Access (RDMA) is being deployed by many cloud providers. To deploy RDMA in Ethernet networks, Priority-based Flow Control (PFC) must be used. PFC, however, makes Ethernet networks prone to deadlocks. Prior work on deadlock avoidance has focused on necessary condition for deadlock formation, which leads to rather onerous and expensive solutions for deadlock avoidance. In this paper, we investigate sufficient conditions for deadlock formation, conjecturing that avoiding sufficient conditions might be less onerous